: UCB Algorithm and Adversarial Bandit Problem
نویسنده
چکیده
. Review on stochastic multi-armed bandit problem Consider a gambler playing against a N−armed slot machine and sequentially pulls up arms to minimize the expected regret. Each arm, i ∈ {1, . . . ,N } has a probability distribution Di supported on the closed interval [0,1]. Let μi be the expected value of the corresponding loss to arm i and define Ĩ = argmin1≤j≤N μj as the index of the optimal expected value among arms. Suppose that the gambler selects arm It at round t, then the expected regret of the gambler up to round T is defined as
منابع مشابه
UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem
ABSTRACT. In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in Karmed bandits after T trials is bounded by const · K log(T ) , where measures the distance between a suboptimal arm an...
متن کاملRegional Multi-Armed Bandits
We consider a variant of the classic multiarmed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when the player selects an arm at each time slot, information of other arms in the same group is also revealed. This regional bandit model naturally bridges the non...
متن کاملCornering Stationary and Restless Mixing Bandits with Remix-UCB
We study the restless bandit problem where arms are associated with stationary φ-mixing processes and where rewards are therefore dependent: the question that arises from this setting is that of carefully recovering some independence by ‘ignoring’ the values of some rewards. As we shall see, the bandit problem we tackle requires us to address the exploration/exploitation/independence trade-off,...
متن کاملBudget-Constrained Multi-Armed Bandits with Multiple Plays
We study the multi-armed bandit problem with multiple plays and a budget constraint for both the stochastic and the adversarial setting. At each round, exactly K out of N possible arms have to be played (with 1 ≤ K ≤ N ). In addition to observing the individual rewards for each arm played, the player also learns a vector of costs which has to be covered with an a-priori defined budget B. The ga...
متن کاملMachine Learning Approaches for Interactive Verification
Interactive verification is a new problem, which is closely related to active learning, but aims to query as many positive instances as possible within some limited query budget. We point out the similarity between interactive verification and another machine learning problem called contextual bandit. The similarity allows us to design interactive verification approaches from existing contextua...
متن کامل